Stuart Slavin

Progress Report

Note: the most recent work is in NN_3

This project involves using neural networks to identify written digits. Neural networks use a system of weighted sums to adjust themselves based on training data in order to do complex tasks. In this case, the program will identify 8x8 images of digits and attempt to associate a number with the image. The base question for this project is creating a program that can identify numbers with moderate accuracy, significantly more than random guessing. I will also attempt to look at how the program can be improved to be the most "efficient", balancing speed and accuracy by varying the number of hidden neurons, the learning constant, etc. Additionally, I will try different "sigmoid" functions to see which ones, if any, improve the learning process.

In order to verify that the program is working as expected, I have split the data into seperate parts. The first 1200 given images will be training data, and the remaining 597 will be used to test the program's accuracy. The accuracy will be output and graphed over several epochs of training. A successful program will exhibit high accuracy in later epochs.

The network itself will be set up with 64 inputs (one for each pixel, varying from 0.0 to 1.0 based on the grayscale value), a yet undetermined number of hidden neurons, possibly in multiple layers, and 10 outputs (one for each number, with the highest value being treated as the correct answer). The structure of the hidden layers is the subject of the first additional question.


In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

In [4]:
"""One of the key parts of the program is the sigmoid function,
which is used in the activation function as the output of the weighted inputs."""
def sigmoid(z):
    return 1/(1 + np.exp(-z))

plt.plot(np.linspace(-10, 10, 100), sigmoid(np.linspace(-10, 10, 100)))


Out[4]:
[<matplotlib.lines.Line2D at 0x7ff844e8f208>]

In [ ]:
'''These are used to put the training data in the right format'''
trainingdata = digits.data[0:1200]
traininganswers = digits.target[0:1200]
lc = 0.02

'''This converts the outputs into length 10 vectors, representing the 
output layer. eg. 6 -> [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]'''
traininganswervectors = np.zeros((1796,10))
for n in range(1796):
    traininganswervectors[n][digits.target[n]] = 1

In [ ]:
'''This function is the feedforward function, it calculates the value of 
a neuron based on its inputs and bias'''
def feedforward(x, weights, biases):
    for w in weights:
        for b in biases:
            result = np.vectorize(sigmoid(np.dot(a, w) + b))
    return result

In [ ]:
'''This is used to find the "minimum" of the error.'''
def GradientDescent(inputs, results, batchsize, lc, epochs):
    for n in range(epochs):
        #pick random locations for input/result data
        locations = np.random.randint(0, len(inputs), batchsize)
        minibatch = []
        #create tuples (inputs, result) based on random locations
        for n2 in batchsize:
            minibatch.append((inputs[locations[n2]], results[locations[n2]]))
        for n3 in range(batchsize):
            train(minibatch[n3], lc)

Additional Question 1

In order to find the most efficient setup for this problem, I will run several tests with different numbers of hidden neurons and graph them based on accuracy and time. I will also find which learning constant does the same, creating the most accurate or fastest program. From this, I will try to identify an "optimal" program, with the best balance between speed and accuracy.

I will test the following hidden network setups:

20 neurons in 1 layer

10 neurons in 2 layers

5 neurons in 4 layers

4 neurons in 5 layers

These all have the same total number of neurons, but in different layers, so this will primarily test the optimal "shape" of a network.

The learning constant is simpler to vary, I will test values between 0.01 and 0.1. If these end up too similar, I will use a larger variation.


In [ ]:

Additional Question 2

Examining the sigmoid function will work similarly to the previous problem. I will test various types of sigmoid function with the program, looking for changes in accuracy and speed. By default, my program is using the function $\frac{1}{1 - e^-x}$ for the sigmoid, but I will also use the following functions:

$\frac{x}{\sqrt{1 + x^2}}$

$tanh(x)$

A piecewise function:

-1 when $x < -1$

x when $-1 <= x <= 1$

1 when $x > 1$

I will see if any of these functions seem to work faster or provide better accuracy.


In [ ]: